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Integrating  Cognitive  and  Psychometric  Models 
to  Measure  Document  Literacy 


Abstract 


The  Survey  of  Young  Adult  Literacy  conducted  in  1985  by  the 
National  Assessment  of  Educational  Progress  included  sixty- three 
items  that  elicited  skills  in  acquiring  and  using  information  from 
written  documents.  These  items  were  analyzed  in  two  distinct 
ways:  (1)  with  an  item  response  theory  (IRT)  model,  which 
characterized  items'  difficulties  and  respondents'  proficiencies 
as  revealed  simply  by  tendencies  toward  correct  res’-  '  '2) 

a  qualitative  cognitive  model,  which  characterized  s 

of  the  processing  tasks  they  required.  This  p  ...  J  -es 

how  a  generalization  of  Fischer  and  Scheiblechner '  -  •  near 
Logistic  Test  Model  can  be  used  to  integrate  information  from  the 
cognitive  analysis  into  the  IRT  analysis. 


Subject  Terms:  Bayesian  estimation;  cognitive  processing  models; 


Item  Response  Theory;  Linear  Logistic  Test  Model;  literacy 


1.0  Introduction 


Perhaps  the  most  important  thrust  in  educational  measurement 
today  is,  in  Burstein's  (1983)  words,  "...  linking  achievement 
testing  to  the  cognitive  processes  employed  in  giving  test 
responses  and  Co  the  instructional  experiences  of  students." 
Standard  item- response  theory  and  classical  true-score 
psychometric  models,  while  often  providing  practicallv  useful 
summaries  of  the  overall  proficiencies  of  examinees  and  of  the 
relative  difficulties  of  items,  do  not  do  this.  Cognitive- 
processing  models,  on  the  other  hand,  are  typically  qualitative, 
descriptive,  and  poorly  suited  to  the  broadly  cast  decision-making 
problems  often  encountered  in  educational  practice.  A  recent  line 
of  development,  therefore,  has  been  to  study  the  characteristics 
of  psychometric  items  as  cognitive  tasks,  using  psychometric 
theory  to  summarize  test  data  for  action  but  cognitive  theorv  to 
construct  and  analyze  the  test  (Embretson,  1985). 

This  paper  describes  the  implementation  of  such  an  approach 
in  the  construction  and  analysis  of  the  Document  Literacy  scale  in 
the  Survey  of  Adult  Literacy  (Kirsch  and  Jungeblut,  1936),  a  study 
carried  out  under  the  auspices  of  the  National  Assessment  of 
Educational  Progress.  After  a  brief  overview  of  the  Adult 
Literacy  project,  we  outline  (i)  a  cognit ive - process ing  model 
proposed  for  solving  the  exercises,  (ii)  a  psychometric  model  for 
the  test,  and  (iii)  a  structure  relating  item  parameters  in  the 
psychometric  model  to  item  features  that  are  salient  in  the 
cognitive  model,  based  on  Mislevv's  (1988)  extension  of 


Sche ib lechner  (1972)  and  Fischer's  (1973)  linear  logistic  test 
model  ( LLTM ) . 

2.0  An  Overview  of  the  NAEP  Literacy  Assessment 

In  1984,  the  U.S.  Department  of  Education  provided  funding 
for  a  nationwide  assessment  ot  the  literacy  skills  of  America's 
young  adults,  ages  21  through  25.  The  assessment  was  designed  and 
carried  out  by  the  National  Assessment  of  Educational  Progress 
(NAEP)  over  the  three  year  period  from  198a  to  1986.  A  major 
innovation  of  the  NAEP  design  was  to  call  for  a  set  of  literacy 
tasks  that  simulate  the  diverse  literacy  demands  of  adult 
interactions  in  occupational,  social,  and  educational  settings. 
Implementation  of  this  design  led  to  a  definition  of  literacy  that 
encompassed  three  distinct  skill  areas: 

o  document  literacy  --  the  skills  needed  to  locate  and  use 
information  contained  in  non-prose  formats  such  as  forms,  tables, 
charts,  signs/labels,  indexes,  schematics,  and  catalogues: 

o  prose  literacy  --  the  skills  needed  to  understand  and  use 
information  from  texts  such  as  editorials,  news  stories  and  poems: 
and 

o  quantitative  literacy  --  the  skills  needed  to  perform 
arithmetic  operations  that  are  embedded  in  printed  materials  such 
as  check  book  registers,  order  forms,  end  loan  advertisements. 

NAEP  developed  a  total  of  ninety-three  literacy  tasks, 
sixty- three  of  which  were  classified  as  measuring  document 
literacy,  fifteen  as  measuring  prose  literaev,  and  fifteen  as 
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measuring  quantitative  literacy.  Most  involved  open-ended 
responses.  For  example,  respondents  were  directed  to:  fill  in  a 
deposit  slip;  determine  eligibility  from  a  table  of  employee 
benefits;  fill  out  an  order  form  taken  from  a  catalogue;  and 
follow  a  set  of  directions  to  travel  from  one  location  to  another 
using  a  map. 

Trained  interviewers  administered  the  literacy  tasks  to  a 
nationally  representative  household  sample  of  approximately  3.600 
young  adults  living  in  the  48  contiguous  United  States,  using  an 
item  sampling  design  under  which  each  task  was  administered  to 
approximately  1,500  respondents.  The  procedures  and  the  results 
of  the  assessment  are  detailed  in  Kirsch  &  Jungeblut  (1986).  In 
this  paper,  we  describe  a  secondary  analysis  that  was  conducted  to 
investigate  correlates  of  task  difficulty.  Due  to  the  small 
numbers  of  tasks  available  for  measuring  prose  literacy  and 
quantitative  literacy,  our  analysis  is  restricted  to  the  sixty- 
three  tasks  which  comprise  the  document  literacy  scale. 

3.0  A  Cognitive  Model  for  Document  Literacy 

A  cognitive  processing  model  for  performance  on  document 
literacy  tasks  has  been  proposed  by  Kirsch  and  Mosenthal  (1988). 
The  model  posits  a  solution  process  that  can  be  summarized  in  the 
following  four  steps:  (1)  Identify  the  information  given  and 
requested  in  the  task  directive;  (2)  search  the  document  until  the 
requested  information  has  been  located;  (3)  make  a  match  between 
the  information  identified  in  the  document  and  the  information 
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requested  in  the  directive;  and  (4)  determine  whether  the-  match, 
adequately  meets  the  criterion  of  the  task. 

As  part  of  an  earlier  study  of  the  factors  influencing 
document  task  difficulty,  Kirsch  and  Mosenthal  developed  a  system 
to  describe  the  complexity  and  organizational  structure  of 
documents  and  of  the  directives  associated  with  document  literacy 
tasks.  This  system,  based  on  a  significant  revision  of 
Mosenthal' s  (1985)  taxonomic  grammar  of  the  expository  continuum, 
characterizes  the  information  contained  in  documents  and  document 
task  directives  according  to  three  basic  levels  of  organization: 
(1)  the  organizing  category  or  OC ,  (2)  the  specific  category  or 
SPE,  and  (3)  the  semantic  feature.  These  three  levels  of 
organization  constitute  three  nested  categories:  semantic  features 
are  properties  of  pieces  of  information  that  belong  to  specific 
categories,  which  are  nested  within  distinct  organizing 
categories.  Specific  categories  can  also  be  nested  within  other 
specific  categories.  Ir  Tact,  the  more  complex  the  document,  the 
more  likely  it  will  be  to  find  several  levels  of  nesting  of  SPEs . 

To  illustrate  these  levels,  consider  the  medicine  label 
given  in  Figure  1.  This  document  has  three  organizing  categories: 
(1)  the  purpose  for  taking  the  medicine,  (2)  the  recommended 
dosage  levels,  and  (3)  the  list  of  cautions.  Within  the  "Purpose" 
OC  are  two  SPEs,  one  specifying  that  the  m°dicine  can  be  taken  for 
"stuffed  noses"  and  one  specifying  that  it  can  also  be  taken  for 
"running  noses".  The  "Dosage"  OC  also  contains  two  SPEs.  one 
containing  information  specific  to  adult  dosages  and  one 
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containing  informant  yecific  to  children's  dosages.  The 
"Caution"  OC ,  which  is  the  most  complex,  contains  four  level -oik- 
SPE's  and  three  level-two  SPEs.  These  levels  are  illustrated  in 
Figure  2,  which  provides  a  full  linguistic  representation,  or 
parsing,  of  the  medicine  label.  The  reader  should  see  Kirsch  and 
Mosenthal  (1988)  for  more  information  about  this  new  grammar. 

Insert  Figures  1  and  2  about  here 

Based  on  this  grammar,  Kirsch  and  Mosenthal  defined  a  number 
of  variables,  which,  according  to  the  processing  model,  would  be 
expected  to  correlate  with  task  difficulty.  These  variables  have 
been  classified  into  three  distinct  types:  (1)  Materials 
variables,  which  characterize  the  length  and  organizational 
complexity  of  the  document  to  which  a  task  refers:  (2)  Directive 
variables,  which  characterize  the  length  and  organizational 
complexity  of  the  task  directive;  and  (3)  Process  variables,  whic 
characterize  the  difficulty  of  the  task  solution  process. 


The 

Materials 

variables  are 

(1) 

the 

number  of 

OCs  in  the  document; 

(2) 

the 

number  of 

OCs  in  the  document 

that  are  embedded: 

(3) 

the 

deepest  level  of  embedding  for 

an  OC  ; 

(9) 

the 

number  of 

SPEs  in  the  document 

(5) 

the 

number  of 

SPEs  in  the  document 

that  are  embedded 

(6) 

the 

deepest  level  of  embedding  for 

an  SPE. 

The 

D  i  rec  t  i  ve 

variables  are 

i.  1  ) 

the 

number  of 

OCs  in  the  directive 
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(2)  che  number  of  OCs  in  the  directive  that  are  embedded; 

(3)  the  deepest  level  of  embedding  for  an  OC ; 

(4)  the  number  of  SPEs  in  the  Directive; 

(5)  the  number  of  SPEs  in  the  Directive  that  are  Embedded;  and 

(6)  the  deepest  level  of  embedding  for  an  SPE. 

The  Process  variables  are  defined  as  follows: 

(1)  Degree  of  Correspondence  (DEGCORR) .  This  variable  refers  to 
the  explicitness  of  the  match  between  the  information  requested  in 
the  directive  or  question  and  the  corresponding  information  in  the 
text.  It  is  scored  on  an  integer  scale  ranging  from  one  to  five 
with  higher  values  indicating  less  explicit  cor respondence  and 
therefore,  more  difficulty.  For  example,  tasks  requiring  a  single 
literal  match  are  scored  one,  tasks  requiring  an  inferential  text- 
based  match  are  scored  three,  and  tasks  requiring  matches  based  on 
specialized  prior  knowledge  are  scor  d  five. 

(2)  Type  of  Information  t'TYPINFO)  .  This  variable  concerns  the 
tvpe  and  number  of  restrictive  conditions  that  must  be  held  in 
mind  in  identifying  and  matching  features.  It  too  is  scored  on  a 
one  to  five  scale  with  lower  values  indicating  less  restrictive 
cond i t  ior.s  . 

( 3 »  Plausibility  of  Distractors  <  DECPLAUS ) .  Document  tasks 
typical  lv  require  the  examinee  to  skim  ,»n  entire  document  in.  orde  r 
to  locate  a  piece  of  requested  information.  Since  anv  piece  ot. 
information  embedded  in  the  document  could  be  interpreted  as  the 
requested  information,  the  t  vpical  interpretation  at’  the  term 
"distract  or"  .  that  is.  the  incorrect  alfern.it  iv*'s  given  with  a 


multiple - cho ice  item,  is  not  appropriate  tor  document  tasks. 
Instead,  document  task  "distractors"  include  all  pieces  ot 
information  embedded  in  the  document .  The  degree  of  plausibilitv 
of  a  distractor  is  measured  by  the  extent  to  which  the  information 
embedded  in  the  document  shares  semantic  information  witt.  the 
correct  answer  to  t.,e  question  or  directive,  but  does  not  satisfv 
all  conditions  specified.  This  variable  is  scored  on  a  one  to 
five  scale  with  lower  numbers  indicating  more  shared  semantic 
:nformation  and  higher  numbers  indicating  less. 

The  relationship  between  these  three  sets  of  variables  and 
the  four-step  processing  model  can  be  stated  as  follows:  The 
Directive  variables  characterize  the  difficulty  of  Step  1, 
identifying  the  information  given  and  requested  in  the  ta^k 
directive;  the  Materials  variables  characterize  the  difficulty  of 
Step  2,  searching  the  document  for  requested  information;  and  the 
Process  variables  characterize  the  difficulty  of  Steps  3  and  , 
matching  information  and  determining  whether  the  criterion  of  the 
task  has  been  satisfied. 

Kirsch  and  Mosenthal  (1988)  succeeded  in  parsing  sixtv-one 
of  the  sixtv-three  doer .  nt  tasks,  then  scored  the  sixtv-ore  in 
terms  of  the  Materials,  Directives,  and  Process  variables  using 
the  scoring  instructions  in  the  appendices  oi  their  report.  The 
results  appear  in  Table  1;  correlations  among  tin  variables  appear 
in  Table  2.  (Because  the  level  of  OC  and  SPE  embeddings  for  the 
document  literacy  task  directives  were  almost  entirtlv  at  the 
first  Level .  not  ail  ot  the  di  ective  embedding  variables  were 
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tabulated.)  Task  46  is  based  on  the  Medicine  Label.  The 


reliability  of  the  scoring  was  checked  by  training  a  third  scorer 
and  observing  the  proportion  of  exact  agreement  in  rescores  of 
one-third  of  the  documents;  the  (very  satisfactory)  results  are 
given  in  Table  3. 

Tables  1-3  about  here 

Kirsch  and  Mosenthal  regressed  task  proport  ions  -  cor rec t  on 
these  task  features  in  the  total  survey  sample  and  in  selected 
subpopulations.  An  adjusted  R2  of  .87  resulted,  with  the 
strongest  predictors  being  numbers  and  embedding  of  OCs ,  and  the 
plausibility  of  distractors.  This  result  provided  empirical 
confirmation  that  the  task  attributes  identified  by  the  processing 
model  did  indeed  largely  account  for  task  difficulty.  The 
analysis  addresses  only  average  difficulty  within  populations, 
however,  and  provides  no  link  between  individuals'  overall 
performance  on  the  sec  of  tasks  and  their  expected  success  with 
documents  and  tasks  with  varying  structures -- the  type  of 
information  required  to  target  instruction  to  individual  students 
and  to  design  documents  for  specified  types  of  users. 

4.0  A  Psychometric  Model  for  Measuring  Task  Difficulty 

In  contrast,  the  expected  outcomes  of  the  confrontations 
between  particular  examinees  and  tasks  are  addressed  bv  the 
response  scaling  methodology  called  item  response  theory  *IRT: 
Lord.  1980).  Lnidimens i onal  7RT  models  express  Che  probability 
that  an  examinee  will  respond  correctly  to  a  particular  test  item 

q 


as  a  funcCion  of  a  single  parameter  that  characterizes  the 
proficiency  of  the  examinee,  and  one  or  more  additional  parameters 
for  each  item  that  characterize  measurement  properties  such  as  irs 
difficulty.  An  important  feature  of  IRT  scaling  is  that  the 
proficiency  levels  of  all  respondents  can  be  reported  on  the  same 
scale  even  when  different  individuals  have  been  administered 
different  subsets  of  tasks,  as  in  the  NAEP  literacy  assessment. 

In  this  paper,  we  use  the  Rasch  IRT  model  (Rasch,  1960)  to 
exemplify  the  process  of  measuring  task  difficulty  with  a 
psychometric  model.  Let  xaj  denote  the  response  of  examinee  i  to 
task  j.  Assume  that  responses  are  dichotomous ly  scored,  with  1 
indicating  a  correct  response  and  0  indicating  an  incorrect 
response.  The  standard  Rasch  model  gives  the  probability  of  a 
correct  response  as 


Pj<*i>  -  P(xi0  -  11^,^) 

exp(^l-/9J)  (1) 

1  +-  exp  (dx-3.) 

where  {)  characterizes  the  difficulty  of  task  j  and  6l 
characterizes  the  proficiency  of  examinee  i.  Under  the  usual 
assumption  of  conditional  independence,  the  probability  of  a 
respondent's  pattern  =  (x......x.n)‘  of  responses  to  n  tasks  is 

obtained  as 

P(x,|t/.,0)  =  n  p.(!h)  1’x-:  , 

j 
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where  Qj  ( ^  )  -  1  -  P,($)  and  fi  =  (/3, . 3^)'  .  The  orobabi  L  i  tv  of  a 

data  matrix  X  =  (x1,...,xN)'  of  responses  from  N  examine0' 
responding  independently  can  be  obtained  as 

P(X|ff.0)  -  n  P(x1|Si>/3)  ,  (3) 

i 

where  9  —  (91 . flN)  '  .  Once  X  has  been  observed,  Equation  3  can 

be  interpreted  as  a  likelihood  function,  and  provides  a  basis  for 
estimating  the  parameters  6  and  /9. 

Table  4  gives  Rasch  item  parameter  estimates  obtained  with 
Mislevy  and  Bock's  (1984)  BILOG  computer  program  tor  the  sixtv-one 
literacy  tasks  that  were  parsed,  on  a  scale  in  which  the 
distribution  of  9  has  a  mean  of  zero  and  a  standard  deviation  of 
one.  Shown  with  estimates  of  the  difficulty  parameters  are  their 
(approximated)  standard  errors  of  estimation,  or  a.  Item  46  is 
the  Medicine  Label  item,  which  with  a  difficulty  parameter 
estimate  of  -2  is  one  of  the  easier  items.  A  value  of  9  could  be 
estimated  for  any  respondent,  and,  via  (1),  the  expectation  of  a 
correct  response  from  that  respondent  to  this  item  or  any  other 
could  be  calculated. 

Table  4  about  here 

IRT  models  such  as  the  Kasch  model  are  widelv  accepted  as 
useful  tools  for  creating  and  analysing  tests,  adding  precision 
and  flexibility  to  the  wavs  that  examinees'  proficiencies  can  be 
measured  and  compared.  Note,  however,  that  these  models  make  no 
reference  to  the  cognitive  processes  which  an  examinee  must  emplov 
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in  order  to  have  a  high  probability  of  making  a  correct  response : 
nor  do  they  address  the  features  of  tasks  that  make  th^m 
difficult.  The  model  parameters  merely  indicate  the  relative 
proficiencies  of  respondents  (9)  and  the  relative  difficulties  of 
tasks  (/3)  in  the  skill  area  considered. 

5.0  An  Integrated  Approach 

In  a  pioneering  step  toward  integrating  cognitive  and 
psychometric  models,  Scheiblechner  (1972)  and  Fischer  (1973) 
posited  a  constrained  Rasch  model  for  item  responses,  the  Linear 
Logistic  Test  Model  (LLTM) .  In  this  model,  task  difficulty 
parameters  are  estimated  as  linear  combinations  of  a  smaller 
number  of  more  elementary  components.  The  elementary  components 
are  defined  to  reflect  differences  in  the  cognitive  processing 
demands  of  the  tasks.  This  approach  represents  a  significant 
advance  beyond  standard  IRT  procedures,  because  it  exploits 
auxilliary  information  about  the  cognitive  processing  demands  of 
tasks  to  address  why  some  tasks  are  more  difficult  than  others. 

To  apply  the  LLTM  to  a  set  of  test  data,  the  usual  response 
matrix  X  must  be  augmented  with  information  pertaining  to  the 
processing  demands  of  each  test  item.  This  information  is 
expressed  in  terms  of  a  set  of  K  variables  characterizing  features 
of  the  items  which  are  salient  in  the  cognitive  processing  model. 
Examples  include  (i)  Fischer's  (1973)  calculus  example,  in  which 
items  are  characterized  in  terms  of  the  number  and  type  of 
operations  a  pupil  must  carry  out  in  order  to  solve  a 
differentiation  problem,  and  (ii)  the  document  literacy  variables 
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which  were  defined  in  Che  previous  section.  Let  q. . qK.  denote 

Che  item  feature  variables  defined  for  the  j th  item.  The  LLTM 


assumes  a  Rasch  model  for  task  difficulty,  but  constrains  the 
difficulty  parameters  P:  as  follows: 

K 

~  2  qkj  rjk  for  j  -  1 ....  n  ,  (4) 

k-1 

or,  in  matrix  notation  p  -  Q '  fj ,  where  Q'  is  an  n  by  K  matrix  of 
item  feature  data  and  i)  =  (ijj, . r?K)  '  . 

The  original  goal  of  explaining  all  of  the  reliable 
variation  in  item  parameters  by  item  features  was  not  realized 
(Fischer  and  Formann,  1982),  as  rigorous  tests  of  the  sufficiency 
of  the  LLTM  against  the  unconstrained  model  failed  with  few 
exceptions.  It  was  often  possible,  however,  to  account  for  large 
portions  of  variation  among  item  difficulties  in  terms  of 
substantively  meaningful  item  features,  thus  providing  insights 
into  the  effects  of  educational  treatments  and  helping  to  identify 
flawed  items  as  unexpectedly  easy  or  hard  in  light  of  the  features 
that  were  expected  to  determine  their  operating  characteristics. 

A  less  restrictive  method  for  incorporating  cognitive 
processing  information  into  a  psychometric  model  has  been  proposed 
by  Mislevy  (1988).  This  alternative  approach  combines  key  aspects 
of  the  LLTM  with  the  exchangeability  concept  of  Bayesian  inference 
(Lindley  &  Novick,  1981).  As  in  the  LLTM,  differences  in  the 
cognitive  processing  demands  of  tasks  are  accounted  for  by 
regressing  task  difficulty  on  a  smaller  set  of  more  elementary 


components.  Unlike  Che  LLTM,  however,  parameter  estimates 
obtained  from  the  fitted  regression  model  are  not  expected  to 
account  for  all  of  the  variation  in  true  task  difficulties. 
Instead,  the  expectation  that  true  task  difficulties  will  be 
distributed  about  the  central  values  predicted  by  the  fitted 
regression  model  is  accounted  for  by  (i)  positing  that  the 
difficulty  parameters  of  tasks  with  similar  values  of  the  item 
feature  variables  are  exchangeable  members  of  a  common  population; 
and  (ii)  imposing  this  task- population  structure  on  the  task 
difficulties,  by  means  of  Bayesian  prior  distributions. 

In  Mislevy's  (1988)  implementation  of  the  approach,  the 
prior  distribution  for  individual  task  difficulties  was  assumed  to 
be  multivariate  normal  with  mean  Q'»j  and  variance  <f>zI ,  where  the 
mean  structure  is  defined  as  in  the  LLTM.  This  model  was  fitted 
as  a  two-stage  empirical  Bayes  (EB)  regression  model: 
unconstrained  difficulty  parameters  for  individual  tasks  (as  in 
Table  4),  estimated  in  the  first  stage,  provide  data  from  which  to 
estimate  the  unknown  parameters  r?  and  4>z  of  the  assumed  item- 
parameter  distribution  in  the  second  stage.  Computational  details 
are  provided  in  that  reference.  Final  task  difficulty  estimates 
(3 j  are  precision-weighted  combinations  of  the  unrestricted  Rasch 
estimates  (3.  and  the  regression  estimates  qj '  rj : 

“  ( wijcl  j  1  +  wZj3J)/(wiJ  +  w2j) 
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where  wx,  *  l/iz  and  w2.  =  1/ct2,.  The  final  cask  difficult'/ 
estimates  can  be  viewed  as  a  compromise  between  LLTM  estimates, 
where  items  with  identical  features  are  constrained  to  have 
identical  difficulty  estimates,  and  standard  Rasch  difficulty 
estimates,  where  information  about  item  features  is  ignored. 

Like  the  LLTM,  this  approach  provides  a  link  between  the 
cognitive  processing  model  assumed  to  be  influencing  task 
responses  and  the  tasks’  resulting  difficulties.  To  the  extent 
that  the  structural  model  for  item  parameters  fits,  it  provides  a 
basis  for  understanding  just  what  makes  items  difficult.  It  is  a 
powerful  argument  for  the  construct  validity  of  a  test  if  it  can 
be  shown  that  item  difficulties  are  determined  predominantly  by 
manipulable  features  in  a  cognitive  model  built  around  the  skills 
intended  to  be  measured  (Embretson,  1985)  .  To  the  extent  that  the 
model  does  not  fit,  it  identifies  unexpectedly  hard  or  easy  items, 
information  that  should  prove  useful  for  item  cons  true t ion . 

6.0  Application  to  the  Document  Literacy  Scale 

As  described  above,  both  the  cognitive  processing  analysis 
and  the  psychometric  analysis  were  first  applied  to  the  Document 
Literacy  data  separately.  The  variables  in  Table  1,  resulting 
from  parsing  the  tasks,  signify  salient  features  of  the  items  as 
indicated  by  the  cognitive  processing  model,  and  provide  insights 
into  their  processing  requirements.  The  unrestricted  Rasch 
difficulty  estimates  (/3)  in  Table  4  indicate  the  difficultv  of 
tasks  from  a  purely  empirical  point  of  view.  We  now  apply  the 
integrated  model  described  in  the  preceeding  section. 
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In  considering  variables  to  include  in  the  augmented  data 
matrix,  Kirsch  and  Mosenthal's  (1988)  results  were  used  to 
eliminate  three  of  the  parsing  variables:  (i)  the  deepest  level 
of  OC  embedding  in  the  Materials,  (ii)  the  deepest  level  of  SPE 
embedding  in  the  Materials,  and  (iii)  the  deepest  level  of  OC 
embedding  in  the  Directives.  Univariate  distributions  were 
tabulated  for  the  nine  remaining  item  feature  variables,  and 
transformations  were  applied  to  eliminate  extreme  asymmetries:  a 
square  root  transformation  for  the  "Number  of  OC's"  variable,  a 
logarithmic  transformation  for  "Number  of  SPE's",  and  logit 
transformations  for  "Number  of  Embedded  OC's"  and  "Number  of 
Embedded  SPE's"  after  expressing  them  as  proportions  of  total  OC's 
and  SPE's  respectively.  In  addition,  both  the  Materials  variables 
and  the  Directive  variables  were  centered  and  scaled  to  have  a 
mean  of  zero  and  variance  1.  Because  the  Process  variables 
represent  ordered  categories,  rather  than  counts,  these  variables 
were  centered  by  recoding  the  original  values  of  1  to  5  as  -1  to 
3.  These  rescaled  variables  were  used  in  all  subsequent  analvses . 

The  parameter  estimates  obtained  from  fitting  a  two-stage 
Empirical  Bayes  regression  model  to  these  data  are  given  in  Table 
5.  They  include  the  estimated  coefficients  for  the  intercept  term 
and  the  nine  item  feature  variables  (r?0,rjlt  .  .  .  .rjg)  ,  and  the 
estimated  standard  deviation  for  the  normal  distribution  of 
residuals  of  the  task  difficulty  parameters  from  their  expected 
values.  Because  the  model  was  estimated  from  standardized  data. 
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the  magnitude  of  the  coefficients  provide  an  indication  of  the 
relative  contribution  of  each  variable  to  expected  difficulty. 

Insert  Table  5  about  here 


To  further  investigate  the  contribution  of  each  item  feature 
variable  to  variation  in  predicted  task  difficulty,  three 
alternative  models  were  estimated:  (1)  a  model  that  excluded  the 
Materials  variables;  (2)  a  model  that  excluded  the  Directive 
variables;  and  (3)  a  model  that  excluded  the  Process  variables. 
The  estimated  coefficients  for  these  three  alternative  models  are 
also  shown  in  Table  5.  Note  the  similarity  of  the  coefficients 
listed  for  the  Materials  variables  in  the  Full  model  and  in  the 
model  which  excluded  the  Directive  variables  (Model  ~2 ) ,  and  the 
similarity  of  the  coefficients  listed  for  the  Directive  variables 
in  the  Full  model  and  in  the  model  which  excluded  the  Materials 
variables  (Model  *1) .  These  similarities  are  a  result  of  the  low 
correlation  between  the  Materials  variables  auu  the  Directive 
variables  2y  contrast,  the  coefficients  of  both  the  Materials 
variables  and  the  Directive  variables  changed  from  the  Full  model 
to  the  model  which  excluded  the  Process  variables  (Model  =3). 
These  changes  are  a  result  of  the  higher  correlations  between  the 
Process  variables  and  the  Materials  variables  and  between  the 
Process  variables  and  the  Directive  variables.  Because  Model  =3 
is  not  contaminated  by  Process  variable  correlation,  its 
coefficients  provide  the  most  accurate  picture  of  the  relative 
contributions  to  predicted  task  difficulty  provided  bv  the 
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Materials  variables  and  the  Directive  variables.  In  particular, 
when  the  process  variables  are  excluded,  task  difficulty  increases 
most  rapidly  with  the  No.  of  SPEs  in  the  Materials  and  the  No.  of 
SPEs  in  the  Directive.  Increasing  the  No.  of  OC's  in  the 
Directive  and  in  the  Materials  also  increases  task  difficulty,  but 
not  by  as  much.  By  far,  the  smallest  contribution  to  task 
difficulty  is  provided  by  the  OC  and  SPE  embedding  variables. 

Table  5  also  lists  approximate  R2  values  for  each  model.  In 
the  standard  regression  setting,  the  R2  statistic  is  calculated  as 
the  ratio  of  explained  variation  to  total  variation.  In  this 
application,  true  task  difficulties  are  unobservable  so  total 
variation  is  approximated  using  the  variation  observed  in  the  EB 
estimates  f) .  Several  conclusions  can  be  drawn  from  the  R2  values. 
First,  differences  in  the  cognitive  processing  demands  of  document 
literacy  tasks,  as  measured  by  the  cognitive  processing  variables 
proposed  by  Kirsch  and  Mosenthal,  account  for  approximately  80%  of 
the  observed  variation  in  task  difficulty.  Second,  the  largest 
contribution  to  explained  variation  is  provided  by  the  Process 
variables.  When  these  variables  were  excluded  from  the  model,  the 
R2  statistic  dropped  by  more  than  20  points.  This  indicates  that 
the  Process  variables  are  tapping  an  aspect  of  task  difficulty 
that  is  not  well  predicted  by  either  the  Materials  variables  or 
the  Directive  variables.  Third,  the  five  point  decreases  in  the 
R2  values  listed  for  Alternative  Models  =*1  and  »2  indicate  that 
both  the  Materials  variables  and  the  Directive  variables  are  also 
measuring  unique  aspects  of  task  difficulty.  Thus,  although  the 
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Process  variables  appear  Co  be  the  most  important,  neither  the 
Materials  variables  nor  the  Directive  variables,  can  be  excluded 
without  diminishing  predictive  capability. 

Figure  3  plots  the  residuals  obtained  from  fitting  the  full 
model  against  percent  correct.  Negative  residuals  indicate  that 
the  task  was  easier  than  predicted,  that  is,  easier  than  other 
tasks  with  similar  values  of  the  item  feature  variables.  The  plot 
shows  a  scatter  of  low  positive  and  negative  residuals  among  casks 
with  percent  correct  values  below  90  percent.  This  suggests  that 
the  item  feature  variables  have  been  successful  at  predicting  task 
difficulty  among  tasks  with  low  percent  correct  values.  However, 
several  high  negative  residuals  occur  among  the  tasks  with  percent 
correct  values  above  90  percent.  This  suggests  that  the  item 
feature  variables  have  not  provided  useful  information  pertaining 
to  gradations  of  difficulty  among  extremely  easy  tasks.* 

Insert  Figure  3  about  here 

7.0  Discussion 

The  two-stage  Empirical  Bayes  regression  model  provides  a 
link  between  Kirsch  and  Mosenthal's  cognitive  model  for  solving 
document  literacy  tasks  and  the  psychometric  IRT  model  for  task 
difficulty.  The  integrated  approach  led  to  the  following 
findings:  (i)  document  literacy  task  difficulty  was  highly  related 

^his  explains  why  the  R2  is  slightly  lower  in  this  analysis  than  in 
Kirsch  and  Mosenthal's  regression  analvsis  of  pe rcents -correc t :  task  feat 
account  poorly  for  differences  among  easy  items,  which  are  minimized  in  r 
percent-correct  metric  but  expanded  in  the  Rasch  difficulty  (logit)  metri 
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co  the  Process  variables  and  somewhat  Less  related  to  the 
Materials  variables  and  the  Directive  variables;  and  (ii)  the 
cognitive  model  for  explaining  task  difficulty  was  deficient  at 
explaining  gradations  of  difficulty  among  extremely  easy  tasks 
Of  course  these  results  are  based  on  only  the  present  data,  which 
effectively  fit  a  regression  model  with  nine  independent  variables 
to  sixty-one  observations.  Extensions  of  the  literacy  survey 
currently  in  progress,  however,  should  yield  response  data  on  as 
many  as  a  hundred  new  document  literacy  tasks  written  to  similar 
specifications.  If  these  subsequent  assessments  reveal  similar 
findings,  an  examination  of  casks  with  high  negative  residuals 
will  be  conducted  in  order  to  determine  factors  associated  with 
extremely  easy  document  literacy  tasks.  Knowledge  of  such  factors 
should  prove  useful  for  document  design  and  construction 

It  is  increasingly  becoming  recognized  that  merelv  high 
reliability  coefficients  do  not  guarantee  a  "good"  test,  nor  do 
high  predictive  relationships  guarantee  a  "valid"  one.  The  onus 
has  been  placed  (appropriately!)  upon  the  tester  to  demonstrate 
that  the  skills  tapped  in  an  educational  test  are  in  fact  those 
deemed  important  to  measure.  The  two-stage  approach  exemplified 
in  this  paper  capitalizes  upon  advances  in  the  psychometric  and 
cognitive  disciplines  to  address  this  need.  IRT  models,  which 
provide  measures  of  overall  proficiency  fo’-  making  decisions  about 
individual  examinees,  also  define  implicitly  the  variable  being 
measured  through  implications  of  correct  response  at  the  various 
levels  of  proficiency.  A  demonstration  that  this  empirical 
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characterization  of  proficiency  can  be  largely  accounted  for  by 
the  key  features  of  items  from  the  perspective  of  a  cognitive- 
model  argues  strongly  for  the  construct  validity  of  the  measure, 
constitutes  a  theoretical  foundation  for  further  item  developmen 
and  provides  an  additional  means  of  detecting  items  that  tap 
irrelevant  skills. 
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Cognitive  Processing  Variables 
for  61  Document  Literacy  Tasks 
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Cognitive  Processing  Variables 
for  61  Document  Literacy  Tasks 


136  0  1  686 

2\  0  1  638 

103  0  i  379 

38  id  3  ?  70 

38  JO  }  970 


Table  2 


Intercorrelations  among  Item  Features 


Materials 


Directives 


Process 


Materials 

(1)  No.  of  OCs  1.00 

(2)  No.  of  OCs 

.25 

.09  .52  .31  -.20 

-  .00 

.13 

-  .10 

-  .45 

-  .40 

-  .  19 

Embedded 

1.00 

.74  .18  -.18  -.05 

.29 

.41 

-  .04 

-  .02 

-  .29 

-  .  34 

(3)  Levels  of  OC 

Embeddings 

1.00  .12  -.12  .15 

.41 

.44 

.03 

.03 

-  .  16 

-  .21 

(4)  No.  of  SPEs 

(5)  No.  of  SPEs 

1.00  .25  -.23 

.31 

.  11 

.  17 

-  .15 

-  .53 

-  .  39 

Embedded 

1.00  .26 

-  .15 

-  .13 

-  .02 

.08 

-  .05 

.00 

(6)  Levels  of  SPE 

Embeddings 

1.00 

-  .  13 

-  .17 

.08 

-  .08 

.09 

.09 

Directives 

(7)  No.  of  OCCs 

(8)  Levels  of  OC 

1.00 

.50 

.50 

-  .07 

-  .41 

-  .  32 

Embeddings 

1.00 

-  .06 

-  .03 

-  .22 

-  .21 

(9)  No.  of  SPEs 

1.00 

-  .02 

-  .40 

-  .  46 

Process 

(10)  Degrees  of 

Correspondence 
V 1 1 )  Type  of 

Information 
(12)  Plausibility  of 
Distractors 


1.00  - .38  - .62 
1.00  - .03 
1.00 
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Table  3 


Proportions  of  Exact  Agreement  Among  Raters 
Variable _ Proportion  of  Agreement 


Materials  Variables 


Number  of  OCs 

100 

% 

Number  of  Embedded  OCs 

100 

% 

Level  of  OC  Embedding 

98 

% 

Number  of  SPEs 

96 

% 

Number  of  Embedded  SPEs 

93 

% 

Level  of  SPE  Embedding 

88 

% 

Directive  Variables 

Number  of  OCs 

96 

% 

Level  of  OC  Embedding 

99 

% 

Number  of  SPEs 

90 

% 

Process  Variables 

Degrees  of  Correspondence 

95 

% 

Type  of  Information 

86 

% 

Plausibility  of  Distractors 

90 

% 
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Table  4 


Results  of  Fitting  an  Unrestricted  Rasch  Model 


- 

- 

% 

- 

- 

% 

Task 

0 

a 

Correct 

Task 

6 

a 

Correct 

1 

-4. 

.051 

0 

.120 

99 

31 

-1. 

.110 

0 

.054 

79 

2 

-3. 

.503 

0 

.088 

98 

32 

-2 

.128 

0 

.047 

91 

3 

.  3 

.277 

0 

.126 

97 

33 

-2 

.412 

0 

.053 

94 

4 

-3, 

,  198 

0 

.121 

97 

34 

-0 

.912 

0 

.051 

76 

5 

-3  . 

.468 

0 

.  147 

96 

35 

-0. 

.201 

0 

.047 

56 

6 

-2. 

.638 

0 

,058 

96 

36 

-1. 

.016 

0, 

.053 

80 

7 

-4, 

.153 

0 

.218 

96 

37 

-2. 

.233 

0 

.078 

94 

8 

-2, 

.914 

0 

.110 

94 

38 

-2. 

.  641 

0 

.093 

96 

9 

-2. 

.758 

0 

.098 

94 

39 

-1, 

.157 

0 

.055 

81 

10 

-1, 

.967 

0 

.0/0 

*1 

40 

-2. 

.  129 

0 

.0  I'D 

9J 

11 

-1. 

.590 

0 

.060 

89 

41 

-2 

.920 

0 

.110 

94 

12 

-1. 

.104 

0 

.053 

81 

42 

-1 

.842 

0 

.067 

90 

13 

-2. 

.247 

0 

.078 

92 

43 

-1 

.894 

0 

.068 

90 

14 

-1. 

.252 

0 

.056 

80 

44 

-1. 

.819 

0 

.066 

89 

15 

-1. 

.217 

0 

.057 

80 

45 

-1. 

.883 

0. 

.068 

91 

16 

-0. 

.420 

0 

.048 

68 

46 

-2 

.062 

0 

.071 

90 

17 

-0 

.  384 

0 

.046 

68 

47 

-1 

.133 

0 

.053 

78 

18 

-1. 

.802 

0 

.066 

88 

48 

-1. 

.245 

0 

.055 

79 

19 

-0 

.613 

0 

.048 

69 

49 

-1. 

.409 

0 

.057 

85 

20 

-0 

.203 

0 

.046 

62 

50 

-1. 

.  884 

0 

.069 

86 

21 

0. 

.294 

0 

.045 

48 

51 

-2 

.413 

0 

.083 

94 

22 

-0 

.471 

0 

.047 

67 

52 

-1. 

.  783 

0. 

.066 

89 

23 

-1. 

.  734 

0 

.063 

89 

53 

-1. 

.365 

0. 

.057 

84 

24 

-1. 

.968 

0 

.068 

92 

54 

-1. 

622 

0. 

062 

37 

25 

-1. 

.896 

0 

.066 

90 

55 

-1, 

095 

0. 

.054 

81 

26 

-0. 

.457 

0 

.047 

67 

56 

0. 

115 

0. 

.046 

52 

27 

-1. 

.712 

0 

.063 

88 

57 

-0. 

,467 

0. 

047 

62 

28 

-1. 

.860 

0 

.066 

88 

58 

-0. 

162 

0. 

046 

63 

29 

-0. 

.  749 

0 

.049 

73 

59 

1. 

244 

0. 

053 

28 

30 

-0. 

.  567 

0 

.048 

68 

60 

0. 

.055 

0. 

046 

59 

61 

-2. 

.726 

0. 

096 

97 

Note:  Rasch  difficulty  estimates  are  not  strictly  monotonical ly  related  to 

proportions  correct  in  this  analysis  because  of  the  matrix-sampling  data 
collection  design;  the  percents-correct  reflect  performance  in  different 
randomly  equivalent  samples  of  respondents 
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Table  5 


Estimated  Regression  Parameters 
and  Approximate  R2  Values 

Full  Alternative  Models 

Variable _ Type _ Model _ #1 _ =2 _ ~3 


Intercept 

-1. 

.404 

-1. 

462 

-1. 

.409 

-1. 

603 

No.OCs 

MAT 

-0 

.096 

e 

-0. 

191 

0. 

157 

No . Emb . OCs 

MAT 

0 

.024 

e 

0. 

048 

0. 

069 

No.SPEs 

MAT 

0 

.  383 

e 

0. 

442 

0. 

459 

No . Emb . SPEs 

MAT 

0 

.  159 

e 

0. 

090 

0. 

099 

No . OCs 

DIR 

0 

.212 

0. 

210 

e 

0. 

245 

No.SPEs 

DIR 

0 

.  144 

0. 

163 

e 

0. 

364 

TYPINFO 

PROC 

0 

.268 

0. 

351 

0. 

.  327 

e 

DEGPLAUS 

PROC 

0 

.202 

0. 

229 

0. 

.264 

e 

DEGCORR 

PROC 

0 

.360 

0. 

285 

0. 

.  372 

e 

Std.Dev.  (0) 

0 

.467 

0. 

538 

0. 

.534 

0. 

689 

Approximate 

R2 

81 

75 

76 

59 

e-variable  was  intentionally  excluded  from  the  model 
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For  Stuffed  and  Running  Noses: 


Dosage :  || 

Adults  -  2  teaspoons  every  4  hours;  jj 

Children  over  6  years  -  1  teaspoon  I 

every  4  hours  .  || 

i| 

Caution:  !| 

Unless  directed  by  physician,  do  not 
exceed  recommended  dosage.  If  drow-  !| 

siness  occurs,  do  not  drive  or  oper-  •] 

ate  dangerous  machinery.  Individuals  |] 

with  high  blood  pressure,  heart  disease,! 
diabetes,  or  thyroid  disease  should  use  !j 
only  as  directed  by  a  physician.  !| 


Figure  1.  The  Medicine  Label  document. 
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*|\OC  purpose 

|  |\SPE  For  Scuffed  Noses 

|  AND  \SPE  For  Running  Noses 

'-'AND  |  \0C  Dosage 

|  | \SPE  *take 

|  |  | \AG  Adults 

|  |  | \0BJ  Ceaspoons 

|  |  |  \ATT  2 

|  |  \TEMP  hours 

I  I  !\ATT  4 

|  \ATT  every 

|  AND  \SPE  *take 
|  |\AG  children 

|  |  \ATT  over  six 

|  j\ATT  teaspoon 

|  |  \ATT  1 

|  \TEMP  hours 

|  | \ATT  4 

|  \ATT  every 

*AND  \0C  caution 

| \S PE  do  exceed 
|  * | \AG  you 

|  |\0BJ  dosage 

|  |  \ATT  recommended 

|  |\NEG  not 

unless  |  COND  \SPE  directed 

|  |\AGT  by  physician 

|  *  \OBJ  you 

*AND  {\SPE  do  drive 
OR  {\SPE  do  operate 
|  * | \AG  you 

|  | \0BJ  machinery 

|  |  \ATT  dangerous 

|  |\NEG  not 

If j C0ND\SPE  occurs 

|  \AG  drowsiness 

*AND  \SPE  should  use 

| \AG  individuals  with 
|  *0R|\ATT  blood  pressure 
|  |  \ATT  high 

|  *0R|\ATT  heart  disease 
|  |  *\ATT  high 

|  *0R|\ATT  diabetes 
|  |  *\ATT  high 

(  OR  \ATT  thyroid  disease 
|  *\ATT  high 

as  COND  \SPE  directed 

| \MAN  only 
\AG  by  phvsician 


Figure  2.  A  parsing  of  the  Medicine  Label  document. 
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